Interpretive Risk Assessment on GWA Data with Sparse Linear Regression

نویسندگان

  • Ingrid Brænne
  • Kai Labusch
  • Thomas Martinetz
  • Amir Madany Mamlouk
چکیده

Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables, i.e., single nucleotide polymorphisms (SNP) that increase the risk for a given phenotype and have been successful in identifying susceptibility loci for several complex diseases. A remaining challenge is however to predict the individual risk based on the genetic pattern. Counting the number of unfavorable alleles is a standard approach to estimate the risk of a disease. However this approach limits the risk prediction by only allowing for a subset of predefined SNPs. Recent studies that apply SVM-learning have been successful in improving the risk prediction for Type I and II diabetes. However, a drawback of the SVM is the poor interpretability of the classifier. The aim is thus to classify based on only a small number of SNPs in order to also allow for a genetic interpretability of the resulting classifier. In this work we propose an algorithm that can do exactly this. We use an approximation method for sparse linear regression problems that has been recently proposed and can be applied to large data sets in order to search for the best sparse risk predicting pattern among the complete set of SNPs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Sparse Coding for Feature Selection on Genome-Wide Association Data

Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analys...

متن کامل

Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure

Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning methods in GWA studies has attracted increasing attention. However, in real-world, nucleotide-level bio-priors have not been well-studied to date...

متن کامل

SNPboost: Interaction Analysis and Risk Prediction on GWA Data

Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies ap...

متن کامل

Original Articles Prediction of Cardiovascular Disease Outcomes and Established Cardiovascular Risk Factors by Genome-Wide Association Markers

Background—Genome-wide association (GWA) platforms have yielded a rapidly increasing number of new genetic markers. The ability of these markers to improve prediction of clinically important outcomes is debated. Methods and Results—A systematic review was performed of GWA-derived markers associated with cardiovascular outcomes or other phenotypes that represent common established risk factors f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010